<!DOCTYPE html>
<html>
  <head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" name="viewport">
  <meta name="description" content="刘清政">
  <meta name="keyword" content="hexo-theme">
  
    <link rel="shortcut icon" href="/css/images/logo.png">
  
  <title>
    
      db/Elasticsearch系列/9-11文档操作/13-Elasticsearch - 分析过程 | Justin-刘清政的博客
    
  </title>
  <link href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet">
  <link href="//cdnjs.cloudflare.com/ajax/libs/nprogress/0.2.0/nprogress.min.css" rel="stylesheet">
  <link href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/tomorrow.min.css" rel="stylesheet">
  
<link rel="stylesheet" href="/css/style.css">

  
    
<link rel="stylesheet" href="/css/plugins/gitment.css">

  
  <script src="//cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
  <script src="//cdnjs.cloudflare.com/ajax/libs/geopattern/1.2.3/js/geopattern.min.js"></script>
  <script src="//cdnjs.cloudflare.com/ajax/libs/nprogress/0.2.0/nprogress.min.js"></script>
  
    
<script src="/js/qrious.js"></script>

  
  
    
<script src="/js/gitment.js"></script>

  
  

  
<meta name="generator" content="Hexo 4.2.0"></head>
<div class="wechat-share">
  <img src="/css/images/logo.png" />
</div>

  <body>
    <header class="header fixed-header">
  <div class="header-container">
    <a class="home-link" href="/">
      <div class="logo"></div>
      <span>Justin-刘清政的博客</span>
    </a>
    <ul class="right-list">
      
        <li class="list-item">
          
            <a href="/" class="item-link">主页</a>
          
        </li>
      
        <li class="list-item">
          
            <a href="/tags/" class="item-link">标签</a>
          
        </li>
      
        <li class="list-item">
          
            <a href="/archives/" class="item-link">归档</a>
          
        </li>
      
        <li class="list-item">
          
            <a href="/about/" class="item-link">关于我</a>
          
        </li>
      
    </ul>
    <div class="menu">
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
      <span class="icon-bar"></span>
    </div>
    <div class="menu-mask">
      <ul class="menu-list">
        
          <li class="menu-item">
            
              <a href="/" class="menu-link">主页</a>
            
          </li>
        
          <li class="menu-item">
            
              <a href="/tags/" class="menu-link">标签</a>
            
          </li>
        
          <li class="menu-item">
            
              <a href="/archives/" class="menu-link">归档</a>
            
          </li>
        
          <li class="menu-item">
            
              <a href="/about/" class="menu-link">关于我</a>
            
          </li>
        
      </ul>
    </div>
  </div>
</header>

    <div id="article-banner">
  <h2>db/Elasticsearch系列/9-11文档操作/13-Elasticsearch - 分析过程</h2>



  <p class="post-date">2020-03-10</p>
    <!-- 不蒜子统计 -->
    <span id="busuanzi_container_page_pv" style='display:none' class="">
        <i class="icon-smile icon"></i> 阅读数：<span id="busuanzi_value_page_pv"></span>次
    </span>
  <div class="arrow-down">
    <a href="javascript:;"></a>
  </div>
</div>
<main class="app-body flex-box">
  <!-- Article START -->
  <article class="post-article">
    <section class="markdown-content"><h1 id="13-Elasticsearch-分析过程"><a href="#13-Elasticsearch-分析过程" class="headerlink" title="13-Elasticsearch - 分析过程"></a>13-Elasticsearch - 分析过程</h1><h2 id="一-前言"><a href="#一-前言" class="headerlink" title="一 前言"></a>一 前言</h2><p>现在，我们已经了解了如何建立索引和搜索数据了。<br> 那么，是时候来探索背后的故事了！当数据传递到<code>elasticsearch</code>后，到底发生了什么？</p>
<h2 id="二分析过程"><a href="#二分析过程" class="headerlink" title="二分析过程"></a>二分析过程</h2><p>当数据被发送到<code>elasticsearch</code>后并加入到倒排索引之前，<code>elasticsearch</code>会对该文档的进行一系列的处理步骤：</p>
<ul>
<li>字符过滤：使用字符过滤器转变字符。</li>
<li>文本切分为分词：将文本（档）分为单个或多个分词。</li>
<li>分词过滤：使用分词过滤器转变每个分词。</li>
<li>分词索引：最终将分词存储在Lucene倒排索引中。</li>
</ul>
<p>整体流程如下图所示：</p>
<p><img src="https://tva1.sinaimg.cn/large/00831rSTly1gco5c5qxnbj311v0u0ahk.jpg" alt="image-20200310003447410"></p>
<p>接下来，我们简要的介绍<code>elasticsearch</code>中的分析器、分词器和分词过滤器。它们配置简单，灵活好用，我们可以通过不同的组合来获取我们想要的分词！</p>
<p>是的，无论多么复杂的分析过程，都是为了获取更加人性化的分词！<br> 接下来，我们来看看其中，在整个分析过程的各个组件吧。</p>
<h2 id="三分析器"><a href="#三分析器" class="headerlink" title="三分析器"></a>三分析器</h2><p>在elasticsearch中，一个分析器可以包括：</p>
<ul>
<li>可选的字符过滤器</li>
<li>一个分词器</li>
<li>0个或多个分词过滤器</li>
</ul>
<p>接下来简要的介绍各内置分词的大致情况。在介绍之前，为了方便演示。如果你已经按照之前的教程安装了<code>ik analysis</code>，现在请暂时将该插件移出<code>plugins</code>目录。</p>
<h3 id="3-1-标准分析器：standard-analyzer"><a href="#3-1-标准分析器：standard-analyzer" class="headerlink" title="3.1 标准分析器：standard analyzer"></a>3.1 标准分析器：standard analyzer</h3><p>标准分析器（standard analyzer）：是elasticsearch的默认分析器，该分析器综合了大多数欧洲语言来说合理的默认模块，包括标准分词器、标准分词过滤器、小写转换分词过滤器和停用词分词过滤器。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>分词结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;that&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 46,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;士&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 46,</span><br><span class="line">      &quot;end_offset&quot; : 47,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;比&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 47,</span><br><span class="line">      &quot;end_offset&quot; : 48,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 48,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-2-简单分析器：simple-analyzer"><a href="#3-2-简单分析器：simple-analyzer" class="headerlink" title="3.2 简单分析器：simple analyzer"></a>3.2 简单分析器：simple analyzer</h3><p>简单分析器（simple analyzer）：简单分析器仅使用了小写转换分词，这意味着在非字母处进行分词，并将分词自动转换为小写。这个分词器对于亚种语言来说效果不佳，因为亚洲语言不是根据空白来分词的，所以一般用于欧洲言中。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;simple&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>分词结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;that&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-3-空白分析器：whitespace-analyzer"><a href="#3-3-空白分析器：whitespace-analyzer" class="headerlink" title="3.3 空白分析器：whitespace analyzer"></a>3.3 空白分析器：whitespace analyzer</h3><p>空白（格）分析器（whitespace analyzer）：这玩意儿只是根据空白将文本切分为若干分词，真是有够偷懒！</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;whitespace&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>分词结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be,&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 19,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;That&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;————&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 40,</span><br><span class="line">      &quot;end_offset&quot; : 44,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-4-停用词分析器：stop-analyzer"><a href="#3-4-停用词分析器：stop-analyzer" class="headerlink" title="3.4 停用词分析器：stop analyzer"></a>3.4 停用词分析器：stop analyzer</h3><p><a href="https://baike.baidu.com/item/停用词/4531676" target="_blank" rel="noopener">停用词</a>分析（stop analyzer）和简单分析器的行为很像，只是在分词流中额外的过滤了停用词。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;stop&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果也很简单：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-5-关键词分析器：keyword-analyzer"><a href="#3-5-关键词分析器：keyword-analyzer" class="headerlink" title="3.5 关键词分析器：keyword analyzer"></a>3.5 关键词分析器：keyword analyzer</h3><p>关键词分析器（keyword analyzer）将整个字段当做单独的分词，如无必要，我们不在映射中使用关键词分析器。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;keyword&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To be or not to be,  That is a question ———— 莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>说的一点没错，分析结果是将整段当做单独的分词。</p>
<h3 id="3-6-模式分析器：pattern-analyzer"><a href="#3-6-模式分析器：pattern-analyzer" class="headerlink" title="3.6 模式分析器：pattern analyzer"></a>3.6 模式分析器：pattern analyzer</h3><p>模式分析器（pattern analyzer）允许我们指定一个分词切分模式。但是通常更佳的方案是使用定制的分析器，组合现有的模式分词器和所需要的分词过滤器更加合适。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;pattern&quot;,</span><br><span class="line">  &quot;explain&quot;: false, </span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;that&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>我们来自定制一个模式分析器，比如我们写匹配邮箱的正则。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">PUT pattern_test</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;analyzer&quot;: &#123;</span><br><span class="line">        &quot;my_email_analyzer&quot;:&#123;</span><br><span class="line">          &quot;type&quot;:&quot;pattern&quot;,</span><br><span class="line">          &quot;pattern&quot;:&quot;\\W|_&quot;,</span><br><span class="line">          &quot;lowercase&quot;:true</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，我们在创建一条索引的时候，配置分析器为自定义的分析器。</p>
<p>需要注意的是，在<code>json</code>字符串中，正则的斜杠需要转义。</p>
<p>我们使用自定义的分析器来查询。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST pattern_test&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;my_email_analyzer&quot;,</span><br><span class="line">  &quot;text&quot;: &quot;John_Smith@foo-bar.com&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;john&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 4,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;smith&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 5,</span><br><span class="line">      &quot;end_offset&quot; : 10,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;foo&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 11,</span><br><span class="line">      &quot;end_offset&quot; : 14,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;bar&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 15,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;com&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 19,</span><br><span class="line">      &quot;end_offset&quot; : 22,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-7-语言和多语言分析器：chinese"><a href="#3-7-语言和多语言分析器：chinese" class="headerlink" title="3.7 语言和多语言分析器：chinese"></a>3.7 语言和多语言分析器：chinese</h3><p>elasticsearch为很多世界流行语言提供良好的、简单的、开箱即用的语言分析器集合：阿拉伯语、亚美尼亚语、巴斯克语、巴西语、保加利亚语、加泰罗尼亚语、中文、捷克语、丹麦、荷兰语、英语、芬兰语、法语、加里西亚语、德语、希腊语、北印度语、匈牙利语、印度尼西亚、爱尔兰语、意大利语、日语、韩国语、库尔德语、挪威语、波斯语、葡萄牙语、罗马尼亚语、俄语、西班牙语、瑞典语、土耳其语和泰语。</p>
<p>我们可以指定其中之一的语言来指定特定的语言分析器，但必须是小写的名字！如果你要分析的语言不在上述集合中，可能还需要搭配相应的插件支持。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;chinese&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 46,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;士&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 46,</span><br><span class="line">      &quot;end_offset&quot; : 47,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;比&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 47,</span><br><span class="line">      &quot;end_offset&quot; : 48,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 48,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>也可以是别语言：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;french&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;Je suis ton père&quot;</span><br><span class="line">&#125;</span><br><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;german&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;Ich bin dein vater&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="3-8-雪球分析器：snowball-analyzer"><a href="#3-8-雪球分析器：snowball-analyzer" class="headerlink" title="3.8 雪球分析器：snowball analyzer"></a>3.8 雪球分析器：snowball analyzer</h3><p>雪球分析器（snowball analyzer）除了使用标准的分词和分词过滤器（和标准分析器一样）也是用了小写分词过滤器和停用词过滤器，除此之外，它还是用了雪球词干器对文本进行<a href="https://baike.baidu.com/item/词干" target="_blank" rel="noopener">词干</a>提取。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;snowball&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 46,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;士&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 46,</span><br><span class="line">      &quot;end_offset&quot; : 47,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;比&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 47,</span><br><span class="line">      &quot;end_offset&quot; : 48,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 48,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h2 id="四-字符过滤器"><a href="#四-字符过滤器" class="headerlink" title="四 字符过滤器"></a>四 字符过滤器</h2><p>字符过滤器在``属性中定义，它是对字符流进行处理。字符过滤器种类不多。elasticearch只提供了三种字符过滤器：</p>
<ul>
<li>HTML字符过滤器（HTML Strip Char Filter）</li>
<li>映射字符过滤器（Mapping Char Filter）</li>
<li>模式替换过滤器（Pattern Replace Char Filter）</li>
</ul>
<p>我们来分别看看都是怎么玩的吧！</p>
<h3 id="4-1-HTML字符过滤器"><a href="#4-1-HTML字符过滤器" class="headerlink" title="4.1 HTML字符过滤器"></a>4.1 HTML字符过滤器</h3><p>HTML字符过滤器（HTML Strip Char Filter）从文本中去除HTML元素。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;keyword&quot;,</span><br><span class="line">  &quot;char_filter&quot;: [&quot;html_strip&quot;],</span><br><span class="line">  &quot;text&quot;:&quot;&lt;p&gt;I&amp;apos;m so &lt;b&gt;happy&lt;&#x2F;b&gt;!&lt;&#x2F;p&gt;&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;&quot;&quot;</span><br><span class="line"></span><br><span class="line">I&#39;m so happy!</span><br><span class="line"></span><br><span class="line">&quot;&quot;&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 32,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="4-2-映射字符过滤器"><a href="#4-2-映射字符过滤器" class="headerlink" title="4.2 映射字符过滤器"></a>4.2 映射字符过滤器</h3><p>映射字符过滤器（Mapping Char Filter）接收键值的映射，每当遇到与键相同的字符串时，它就用该键关联的值替换它们。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">PUT pattern_test4</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;analyzer&quot;: &#123;</span><br><span class="line">        &quot;my_analyzer&quot;:&#123;</span><br><span class="line">          &quot;tokenizer&quot;:&quot;keyword&quot;,</span><br><span class="line">          &quot;char_filter&quot;:[&quot;my_char_filter&quot;]</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;,</span><br><span class="line">      &quot;char_filter&quot;:&#123;</span><br><span class="line">          &quot;my_char_filter&quot;:&#123;</span><br><span class="line">            &quot;type&quot;:&quot;mapping&quot;,</span><br><span class="line">            &quot;mappings&quot;:[&quot;苍井空 &#x3D;&gt; 666&quot;,&quot;武藤兰 &#x3D;&gt; 888&quot;]</span><br><span class="line">          &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，我们自定义了一个分析器，其内的分词器使用关键字分词器，字符过滤器则是自定制的，将字符中的苍井空替换为666，武藤兰替换为888。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST pattern_test4&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;my_analyzer&quot;,</span><br><span class="line">  &quot;text&quot;: &quot;苍井空热爱武藤兰，可惜后来苍井空结婚了&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;666热爱888，可惜后来666结婚了&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 19,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="4-3-模式替换过滤器"><a href="#4-3-模式替换过滤器" class="headerlink" title="4.3 模式替换过滤器"></a>4.3 模式替换过滤器</h3><p>模式替换过滤器（Pattern Replace Char Filter）使用正则表达式匹配并替换字符串中的字符。但要小心你写的抠脚的正则表达式。因为这可能导致性能变慢！</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">PUT pattern_test5</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;analyzer&quot;: &#123;</span><br><span class="line">        &quot;my_analyzer&quot;: &#123;</span><br><span class="line">          &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">          &quot;char_filter&quot;: [</span><br><span class="line">            &quot;my_char_filter&quot;</span><br><span class="line">          ]</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;,</span><br><span class="line">      &quot;char_filter&quot;: &#123;</span><br><span class="line">        &quot;my_char_filter&quot;: &#123;</span><br><span class="line">          &quot;type&quot;: &quot;pattern_replace&quot;,</span><br><span class="line">          &quot;pattern&quot;: &quot;(\\d+)-(?&#x3D;\\d)&quot;,</span><br><span class="line">          &quot;replacement&quot;: &quot;$1_&quot;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，我们自定义了一个正则规则。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST pattern_test5&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;analyzer&quot;: &quot;my_analyzer&quot;,</span><br><span class="line">  &quot;text&quot;: &quot;My credit card is 123-456-789&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;My&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;credit&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 9,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;card&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 10,</span><br><span class="line">      &quot;end_offset&quot; : 14,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 15,</span><br><span class="line">      &quot;end_offset&quot; : 17,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;123_456_789&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 18,</span><br><span class="line">      &quot;end_offset&quot; : 29,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;NUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>我们大致的了解elasticsearch分析处理数据的流程。但可以看到的是，我们极少地在例子中演示中文处理。因为elasticsearch内置的分析器处理起来中文不是很好。所以，后续会介绍一个重量级的插件就是elasticsearch analysis ik（一般习惯称呼为ik分词器）。</p>
<h2 id="五-分词器"><a href="#五-分词器" class="headerlink" title="五 分词器"></a>五 分词器</h2><p>由于elasticsearch内置了分析器，它同样也包含了分词器。分词器，顾名思义，主要的操作是将文本字符串分解为小块，而这些小块这被称为分词<code>token</code>。</p>
<h3 id="5-1-标准分词器：standard-tokenizer"><a href="#5-1-标准分词器：standard-tokenizer" class="headerlink" title="5.1 标准分词器：standard tokenizer"></a>5.1 标准分词器：standard tokenizer</h3><p>标准分词器（standard tokenizer）是一个基于语法的分词器，对于大多数欧洲语言来说还是不错的，它同时还处理了Unicode文本的分词，但分词默认的最大长度是255字节，它也移除了逗号和句号这样的标点符号。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;That&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 46,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;士&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 46,</span><br><span class="line">      &quot;end_offset&quot; : 47,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;比&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 47,</span><br><span class="line">      &quot;end_offset&quot; : 48,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 48,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-2-关键词分词器：keyword-tokenizer"><a href="#5-2-关键词分词器：keyword-tokenizer" class="headerlink" title="5.2 关键词分词器：keyword tokenizer"></a>5.2 关键词分词器：keyword tokenizer</h3><p>关键词分词器（keyword tokenizer）是一种简单的分词器，将整个文本作为单个的分词，提供给分词过滤器，当你只想用分词过滤器，而不做分词操作时，它是不错的选择。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;keyword&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To be or not to be,  That is a question ———— 莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-3-字母分词器：letter-tokenizer"><a href="#5-3-字母分词器：letter-tokenizer" class="headerlink" title="5.3 字母分词器：letter tokenizer"></a>5.3 字母分词器：letter tokenizer</h3><p>字母分词器（letter tokenizer）根据非字母的符号，将文本切分成分词。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;letter&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;That&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-4-小写分词器：lowercase-tokenizer"><a href="#5-4-小写分词器：lowercase-tokenizer" class="headerlink" title="5.4 小写分词器：lowercase tokenizer"></a>5.4 小写分词器：lowercase tokenizer</h3><p>小写分词器（lowercase tokenizer）结合了常规的字母分词器和小写分词过滤器（跟你想的一样，就是将所有的分词转化为小写）的行为。通过一个单独的分词器来实现的主要原因是，一次进行两项操作会获得更好的性能。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;lowercase&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;that&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-5-空白分词器：whitespace-tokenizer"><a href="#5-5-空白分词器：whitespace-tokenizer" class="headerlink" title="5.5 空白分词器：whitespace tokenizer"></a>5.5 空白分词器：whitespace tokenizer</h3><p>空白分词器（whitespace tokenizer）通过空白来分隔不同的分词，空白包括空格、制表符、换行等。但是，我们需要注意的是，空白分词器不会删除任何标点符号。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;whitespace&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;or&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 8,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;not&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;to&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 15,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;be,&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 16,</span><br><span class="line">      &quot;end_offset&quot; : 19,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;That&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 25,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;is&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 26,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 30,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;question&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 31,</span><br><span class="line">      &quot;end_offset&quot; : 39,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;————&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 40,</span><br><span class="line">      &quot;end_offset&quot; : 44,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 45,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-6-模式分词器：pattern-tokenizer"><a href="#5-6-模式分词器：pattern-tokenizer" class="headerlink" title="5.6 模式分词器：pattern tokenizer"></a>5.6 模式分词器：pattern tokenizer</h3><p>模式分词器（pattern tokenizer）允许指定一个任意的模式，将文本切分为分词。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;pattern&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>现在让我们手动定制一个以逗号分隔的分词器。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">PUT pattern_test2</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;analyzer&quot;: &#123;</span><br><span class="line">        &quot;my_analyzer&quot;:&#123;</span><br><span class="line">          &quot;tokenizer&quot;:&quot;my_tokenizer&quot;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;,</span><br><span class="line">      &quot;tokenizer&quot;: &#123;</span><br><span class="line">        &quot;my_tokenizer&quot;:&#123;</span><br><span class="line">          &quot;type&quot;:&quot;pattern&quot;,</span><br><span class="line">          &quot;pattern&quot;:&quot;,&quot;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，在settings下的自定义分析器my_analyzer中，自定义的模式分词器名叫my_tokenizer；在与自定义分析器同级，为新建的自定义模式分词器设置一些属性，比如以逗号分隔。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST pattern_test2&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;my_tokenizer&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;To be or not to be,  That is a question ———— 莎士比亚&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;To be or not to be&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;  That is a question ———— 莎士比亚&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 19,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>根据结果可以看到，文档被逗号分割为两部分。</p>
<h3 id="5-7-UAX-URL电子邮件分词器：UAX-RUL-email-tokenizer"><a href="#5-7-UAX-URL电子邮件分词器：UAX-RUL-email-tokenizer" class="headerlink" title="5.7 UAX URL电子邮件分词器：UAX RUL email tokenizer"></a>5.7 UAX URL电子邮件分词器：UAX RUL email tokenizer</h3><p>在处理单个的英文单词的情况下，标准分词器是个非常好的选择，但是现在很多的网站以网址或电子邮件作为结尾，比如我们现在有这样的一个文本：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">作者：张开</span><br><span class="line">来源：未知 </span><br><span class="line">原文：https:&#x2F;&#x2F;www.cnblogs.com&#x2F;Neeo&#x2F;articles&#x2F;10402742.html</span><br><span class="line">邮箱：xxxxxxx@xx.com</span><br><span class="line">版权声明：本文为博主原创文章，转载请附上博文链接！</span><br></pre></td></tr></table></figure>

<p>现在让我们使用标准分词器查看一下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;作者：张开来源：未知原文：https:&#x2F;&#x2F;www.cnblogs.com&#x2F;Neeo&#x2F;articles&#x2F;10402742.html邮箱：xxxxxxx@xx.com版权声明：本文为博主原创文章，转载请附上博文链接！&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果很长：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br><span class="line">234</span><br><span class="line">235</span><br><span class="line">236</span><br><span class="line">237</span><br><span class="line">238</span><br><span class="line">239</span><br><span class="line">240</span><br><span class="line">241</span><br><span class="line">242</span><br><span class="line">243</span><br><span class="line">244</span><br><span class="line">245</span><br><span class="line">246</span><br><span class="line">247</span><br><span class="line">248</span><br><span class="line">249</span><br><span class="line">250</span><br><span class="line">251</span><br><span class="line">252</span><br><span class="line">253</span><br><span class="line">254</span><br><span class="line">255</span><br><span class="line">256</span><br><span class="line">257</span><br><span class="line">258</span><br><span class="line">259</span><br><span class="line">260</span><br><span class="line">261</span><br><span class="line">262</span><br><span class="line">263</span><br><span class="line">264</span><br><span class="line">265</span><br><span class="line">266</span><br><span class="line">267</span><br><span class="line">268</span><br><span class="line">269</span><br><span class="line">270</span><br><span class="line">271</span><br><span class="line">272</span><br><span class="line">273</span><br><span class="line">274</span><br><span class="line">275</span><br><span class="line">276</span><br><span class="line">277</span><br><span class="line">278</span><br><span class="line">279</span><br><span class="line">280</span><br><span class="line">281</span><br><span class="line">282</span><br><span class="line">283</span><br><span class="line">284</span><br><span class="line">285</span><br><span class="line">286</span><br><span class="line">287</span><br><span class="line">288</span><br><span class="line">289</span><br><span class="line">290</span><br><span class="line">291</span><br><span class="line">292</span><br><span class="line">293</span><br><span class="line">294</span><br><span class="line">295</span><br><span class="line">296</span><br><span class="line">297</span><br><span class="line">298</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;作&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 1,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;者&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 1,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;张&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 4,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;开&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 4,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;来&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 5,</span><br><span class="line">      &quot;end_offset&quot; : 6,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;源&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 7,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;未&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 8,</span><br><span class="line">      &quot;end_offset&quot; : 9,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;知&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 10,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;原&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 10,</span><br><span class="line">      &quot;end_offset&quot; : 11,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 11,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;https&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;www.cnblogs.com&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 21,</span><br><span class="line">      &quot;end_offset&quot; : 36,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;Neeo&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 37,</span><br><span class="line">      &quot;end_offset&quot; : 41,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;articles&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 42,</span><br><span class="line">      &quot;end_offset&quot; : 50,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;10402742&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 51,</span><br><span class="line">      &quot;end_offset&quot; : 59,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;NUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 14</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;html&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 60,</span><br><span class="line">      &quot;end_offset&quot; : 64,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 15</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;邮&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 64,</span><br><span class="line">      &quot;end_offset&quot; : 65,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 16</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;箱&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 65,</span><br><span class="line">      &quot;end_offset&quot; : 66,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 17</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;xxxxxxx&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 67,</span><br><span class="line">      &quot;end_offset&quot; : 74,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 18</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;xx.com&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 75,</span><br><span class="line">      &quot;end_offset&quot; : 81,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 19</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;版&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 81,</span><br><span class="line">      &quot;end_offset&quot; : 82,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 20</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;权&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 82,</span><br><span class="line">      &quot;end_offset&quot; : 83,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 21</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;声&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 83,</span><br><span class="line">      &quot;end_offset&quot; : 84,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 22</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;明&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 84,</span><br><span class="line">      &quot;end_offset&quot; : 85,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 23</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;本&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 86,</span><br><span class="line">      &quot;end_offset&quot; : 87,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 24</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 87,</span><br><span class="line">      &quot;end_offset&quot; : 88,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 25</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;为&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 88,</span><br><span class="line">      &quot;end_offset&quot; : 89,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 26</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;博&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 89,</span><br><span class="line">      &quot;end_offset&quot; : 90,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 27</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;主&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 90,</span><br><span class="line">      &quot;end_offset&quot; : 91,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 28</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;原&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 91,</span><br><span class="line">      &quot;end_offset&quot; : 92,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 29</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;创&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 92,</span><br><span class="line">      &quot;end_offset&quot; : 93,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 30</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 93,</span><br><span class="line">      &quot;end_offset&quot; : 94,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 31</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;章&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 94,</span><br><span class="line">      &quot;end_offset&quot; : 95,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 32</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;转&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 96,</span><br><span class="line">      &quot;end_offset&quot; : 97,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 33</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;载&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 97,</span><br><span class="line">      &quot;end_offset&quot; : 98,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 34</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;请&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 98,</span><br><span class="line">      &quot;end_offset&quot; : 99,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 35</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;附&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 99,</span><br><span class="line">      &quot;end_offset&quot; : 100,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 36</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;上&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 100,</span><br><span class="line">      &quot;end_offset&quot; : 101,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 37</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;博&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 101,</span><br><span class="line">      &quot;end_offset&quot; : 102,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 38</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 102,</span><br><span class="line">      &quot;end_offset&quot; : 103,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 39</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;链&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 103,</span><br><span class="line">      &quot;end_offset&quot; : 104,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 40</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;接&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 104,</span><br><span class="line">      &quot;end_offset&quot; : 105,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 41</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>无论如何，这个结果不符合我们的预期，因为把我们的邮箱和网址分的乱七八糟！那么针对这种情况，我们应该使用UAX URL电子邮件分词器（UAX RUL email tokenizer），该分词器将电子邮件和URL都作为单独的分词进行保留。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;uax_url_email&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;作者：张开来源：未知原文：https:&#x2F;&#x2F;www.cnblogs.com&#x2F;Neeo&#x2F;articles&#x2F;10402742.html邮箱：xxxxxxx@xx.com版权声明：本文为博主原创文章，转载请附上博文链接！&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br><span class="line">234</span><br><span class="line">235</span><br><span class="line">236</span><br><span class="line">237</span><br><span class="line">238</span><br><span class="line">239</span><br><span class="line">240</span><br><span class="line">241</span><br><span class="line">242</span><br><span class="line">243</span><br><span class="line">244</span><br><span class="line">245</span><br><span class="line">246</span><br><span class="line">247</span><br><span class="line">248</span><br><span class="line">249</span><br><span class="line">250</span><br><span class="line">251</span><br><span class="line">252</span><br><span class="line">253</span><br><span class="line">254</span><br><span class="line">255</span><br><span class="line">256</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;作&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 1,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;者&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 1,</span><br><span class="line">      &quot;end_offset&quot; : 2,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;张&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 3,</span><br><span class="line">      &quot;end_offset&quot; : 4,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;开&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 4,</span><br><span class="line">      &quot;end_offset&quot; : 5,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;来&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 5,</span><br><span class="line">      &quot;end_offset&quot; : 6,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;源&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 6,</span><br><span class="line">      &quot;end_offset&quot; : 7,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;未&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 8,</span><br><span class="line">      &quot;end_offset&quot; : 9,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;知&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 9,</span><br><span class="line">      &quot;end_offset&quot; : 10,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;原&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 10,</span><br><span class="line">      &quot;end_offset&quot; : 11,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 11,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;https:&#x2F;&#x2F;www.cnblogs.com&#x2F;Neeo&#x2F;articles&#x2F;10402742.html&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 64,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;URL&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;邮&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 64,</span><br><span class="line">      &quot;end_offset&quot; : 65,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;箱&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 65,</span><br><span class="line">      &quot;end_offset&quot; : 66,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;xxxxxxx@xx.com&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 67,</span><br><span class="line">      &quot;end_offset&quot; : 81,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;EMAIL&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 13</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;版&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 81,</span><br><span class="line">      &quot;end_offset&quot; : 82,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 14</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;权&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 82,</span><br><span class="line">      &quot;end_offset&quot; : 83,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 15</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;声&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 83,</span><br><span class="line">      &quot;end_offset&quot; : 84,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 16</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;明&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 84,</span><br><span class="line">      &quot;end_offset&quot; : 85,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 17</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;本&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 86,</span><br><span class="line">      &quot;end_offset&quot; : 87,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 18</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 87,</span><br><span class="line">      &quot;end_offset&quot; : 88,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 19</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;为&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 88,</span><br><span class="line">      &quot;end_offset&quot; : 89,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 20</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;博&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 89,</span><br><span class="line">      &quot;end_offset&quot; : 90,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 21</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;主&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 90,</span><br><span class="line">      &quot;end_offset&quot; : 91,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 22</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;原&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 91,</span><br><span class="line">      &quot;end_offset&quot; : 92,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 23</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;创&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 92,</span><br><span class="line">      &quot;end_offset&quot; : 93,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 24</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 93,</span><br><span class="line">      &quot;end_offset&quot; : 94,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 25</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;章&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 94,</span><br><span class="line">      &quot;end_offset&quot; : 95,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 26</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;转&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 96,</span><br><span class="line">      &quot;end_offset&quot; : 97,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 27</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;载&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 97,</span><br><span class="line">      &quot;end_offset&quot; : 98,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 28</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;请&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 98,</span><br><span class="line">      &quot;end_offset&quot; : 99,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 29</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;附&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 99,</span><br><span class="line">      &quot;end_offset&quot; : 100,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 30</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;上&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 100,</span><br><span class="line">      &quot;end_offset&quot; : 101,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 31</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;博&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 101,</span><br><span class="line">      &quot;end_offset&quot; : 102,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 32</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;文&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 102,</span><br><span class="line">      &quot;end_offset&quot; : 103,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 33</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;链&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 103,</span><br><span class="line">      &quot;end_offset&quot; : 104,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 34</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;接&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 104,</span><br><span class="line">      &quot;end_offset&quot; : 105,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;IDEOGRAPHIC&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 35</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="5-8-路径层次分词器：path-hierarchy-tokenizer"><a href="#5-8-路径层次分词器：path-hierarchy-tokenizer" class="headerlink" title="5.8 路径层次分词器：path hierarchy tokenizer"></a>5.8 路径层次分词器：path hierarchy tokenizer</h3><p>路径层次分词器（path hierarchy tokenizer）允许以特定的方式索引文件系统的路径，这样在搜索时，共享同样路径的文件将被作为结果返回。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;path_hierarchy&quot;,</span><br><span class="line">  &quot;text&quot;:&quot;&#x2F;usr&#x2F;local&#x2F;python&#x2F;python2.7&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>返回结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;&#x2F;usr&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 4,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;&#x2F;usr&#x2F;local&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 10,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;&#x2F;usr&#x2F;local&#x2F;python&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 17,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;&#x2F;usr&#x2F;local&#x2F;python&#x2F;python2.7&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 27,</span><br><span class="line">      &quot;type&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h2 id="六-分词过滤器"><a href="#六-分词过滤器" class="headerlink" title="六 分词过滤器"></a>六 分词过滤器</h2><p>asticsearch内置很多（真是变态多啊！但一般用不到，美滋滋！！！）的分词过滤器。其中包含分词过滤器和字符过滤器。<br> <strong>常见分词过滤器</strong><br> 这里仅列举几个常见的分词过滤器（token filter）包括：</p>
<ul>
<li>标准分词过滤器（Standard Token Filter）在6.5.0版本弃用。此筛选器已被弃用，将在下一个主要版本中删除。在之前的版本中其实也没干啥，甚至在更老版本的<code>Lucene</code>中，它用于去除单词结尾的<code>s</code>字符，还有不必要的句点字符，但是现在， 连这些小功能都被其他的分词器和分词过滤器顺手干了，真可怜！</li>
<li>ASCII折叠分词过滤器（ASCII Folding Token Filter）将前127个ASCII字符(基本拉丁语的Unicode块)中不包含的字母、数字和符号Unicode字符转换为对应的ASCII字符(如果存在的话）。</li>
<li>扁平图形分词过滤器（Flatten Graph Token  Filter）接受任意图形标记流。例如由同义词图形标记过滤器生成的标记流，并将其展平为适合索引的单个线性标记链。这是一个有损的过程，因为单独的侧路径被压扁在彼此之上，但是如果在索引期间使用图形令牌流是必要的，因为Lucene索引当前不能表示图形。 出于这个原因，最好只在搜索时应用图形分析器，因为这样可以保留完整的图形结构，并为邻近查询提供正确的匹配。该功能在Lucene中为实验性功能。</li>
<li>长度标记过滤器（Length Token Filter）会移除分词流中太长或者太短的标记，它是可配置的，我们可以在settings中设置。</li>
<li>小写分词过滤器（Lowercase Token Filter）将分词规范化为小写，它通过<code>language</code>参数支持希腊语、爱尔兰语和土耳其语小写标记过滤器。</li>
<li>大写分词过滤器（Uppercase Token Filter）将分词规范为大写。</li>
</ul>
<p>其余分词过滤器不一一列举。详情参见<a href="https://www.elastic.co/guide/en/elasticsearch/reference/6.5/index.html" target="_blank" rel="noopener">官网</a>。</p>
<h3 id="6-1-自定义分词过滤器"><a href="#6-1-自定义分词过滤器" class="headerlink" title="6.1 自定义分词过滤器"></a>6.1 自定义分词过滤器</h3><p>接下来我们简单的来学习自定义两个分词过滤器。首先是长度分词过滤器。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">PUT pattern_test3</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;filter&quot;: &#123;</span><br><span class="line">        &quot;my_test_length&quot;:&#123;</span><br><span class="line">          &quot;type&quot;:&quot;length&quot;,</span><br><span class="line">          &quot;max&quot;:8,</span><br><span class="line">          &quot;min&quot;:2</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，我们自定义了一个长度过滤器，过滤掉长度大于8和小于2的分词。<br> 需要补充的是，max参数表示最大分词长度。默认为Integer.MAX_VALUE，就是2147483647（231−1</p>
<p>），而min则表示最小长度，默认为0。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">POST pattern_test3&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;filter&quot;: [&quot;my_test_length&quot;],</span><br><span class="line">  &quot;text&quot;:&quot;a Small word and a longerword&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;Small&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 2,</span><br><span class="line">      &quot;end_offset&quot; : 7,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 8,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;and&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 16,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="6-2-自定义小写分词过滤器"><a href="#6-2-自定义小写分词过滤器" class="headerlink" title="6.2 自定义小写分词过滤器"></a>6.2 自定义小写分词过滤器</h3><p>自定义一个小写分词过滤器，过滤希腊文：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">PUT lowercase_example</span><br><span class="line">&#123;</span><br><span class="line">  &quot;settings&quot;: &#123;</span><br><span class="line">    &quot;analysis&quot;: &#123;</span><br><span class="line">      &quot;analyzer&quot;: &#123;</span><br><span class="line">        &quot;standard_lowercase_example&quot;: &#123;</span><br><span class="line">          &quot;type&quot;: &quot;custom&quot;,</span><br><span class="line">          &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">          &quot;filter&quot;: [&quot;lowercase&quot;]</span><br><span class="line">        &#125;,</span><br><span class="line">        &quot;greek_lowercase_example&quot;: &#123;</span><br><span class="line">          &quot;type&quot;: &quot;custom&quot;,</span><br><span class="line">          &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">          &quot;filter&quot;: [&quot;greek_lowercase&quot;]</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;,</span><br><span class="line">      &quot;filter&quot;: &#123;</span><br><span class="line">        &quot;greek_lowercase&quot;: &#123;</span><br><span class="line">          &quot;type&quot;: &quot;lowercase&quot;,</span><br><span class="line">          &quot;language&quot;: &quot;greek&quot;</span><br><span class="line">        &#125;</span><br><span class="line">      &#125;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>过滤内容是：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">POST lowercase_example&#x2F;_analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;filter&quot;: [&quot;greek_lowercase&quot;],</span><br><span class="line">  &quot;text&quot;:&quot;Ένα φίλτρο διακριτικού τύπου πεζά s ομαλοποιεί το κείμενο διακριτικού σε χαμηλότερη θήκη&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;ενα&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 3,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;φιλτρο&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 4,</span><br><span class="line">      &quot;end_offset&quot; : 10,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;διακριτικου&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 11,</span><br><span class="line">      &quot;end_offset&quot; : 22,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;τυπου&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 23,</span><br><span class="line">      &quot;end_offset&quot; : 28,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;πεζα&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 29,</span><br><span class="line">      &quot;end_offset&quot; : 33,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;s&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 34,</span><br><span class="line">      &quot;end_offset&quot; : 35,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;ομαλοποιει&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 36,</span><br><span class="line">      &quot;end_offset&quot; : 46,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 6</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;το&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 47,</span><br><span class="line">      &quot;end_offset&quot; : 49,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 7</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;κειμενο&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 50,</span><br><span class="line">      &quot;end_offset&quot; : 57,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 8</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;διακριτικου&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 58,</span><br><span class="line">      &quot;end_offset&quot; : 69,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 9</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;σε&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 70,</span><br><span class="line">      &quot;end_offset&quot; : 72,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 10</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;χαμηλοτερη&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 73,</span><br><span class="line">      &quot;end_offset&quot; : 83,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 11</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;θηκη&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 84,</span><br><span class="line">      &quot;end_offset&quot; : 88,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 12</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<h3 id="6-3-多个分词过滤器"><a href="#6-3-多个分词过滤器" class="headerlink" title="6.3 多个分词过滤器"></a>6.3 多个分词过滤器</h3><p>除此之外，我们可以使用多个分词过滤器。例如我们在使用长度过滤器时，可以同时使用小写分词过滤器或者更多。</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">POST _analyze</span><br><span class="line">&#123;</span><br><span class="line">  &quot;tokenizer&quot;: &quot;standard&quot;,</span><br><span class="line">  &quot;filter&quot;: [&quot;length&quot;,&quot;lowercase&quot;],</span><br><span class="line">  &quot;text&quot;:&quot;a Small word and a longerword&quot;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

<p>上例中，我们用列表来管理多个分词过滤器。<br> 结果如下：</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">  &quot;tokens&quot; : [</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 0,</span><br><span class="line">      &quot;end_offset&quot; : 1,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 0</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;small&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 2,</span><br><span class="line">      &quot;end_offset&quot; : 7,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 1</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;word&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 8,</span><br><span class="line">      &quot;end_offset&quot; : 12,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 2</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;and&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 13,</span><br><span class="line">      &quot;end_offset&quot; : 16,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 3</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;a&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 17,</span><br><span class="line">      &quot;end_offset&quot; : 18,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 4</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">      &quot;token&quot; : &quot;longerword&quot;,</span><br><span class="line">      &quot;start_offset&quot; : 19,</span><br><span class="line">      &quot;end_offset&quot; : 29,</span><br><span class="line">      &quot;type&quot; : &quot;&lt;ALPHANUM&gt;&quot;,</span><br><span class="line">      &quot;position&quot; : 5</span><br><span class="line">    &#125;</span><br><span class="line">  ]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>

</section>
    <!-- Tags START -->
    
    <!-- Tags END -->
    <!-- NAV START -->
    
  <div class="nav-container">
    <!-- reverse left and right to put prev and next in a more logic postition -->
    
      <a class="nav-left" href="/db/Elasticsearch%E7%B3%BB%E5%88%97/9-11%E6%96%87%E6%A1%A3%E6%93%8D%E4%BD%9C/12-Elasticsearch%E4%B9%8Bmappings%20parameters/">
        <span class="nav-arrow">← </span>
        
          db/Elasticsearch系列/9-11文档操作/12-Elasticsearch之mappings parameters
        
      </a>
    
    
      <a class="nav-right" href="/db/Elasticsearch%E7%B3%BB%E5%88%97/9-11%E6%96%87%E6%A1%A3%E6%93%8D%E4%BD%9C/14-Elasticsearch%20-%20ik%E5%88%86%E8%AF%8D%E5%99%A8/">
        
          db/Elasticsearch系列/9-11文档操作/14-Elasticsearch - ik分词器
        
        <span class="nav-arrow"> →</span>
      </a>
    
  </div>

    <!-- NAV END -->
    <!-- 打赏 START -->
    
      <div class="money-like">
        <div class="reward-btn">
          赏
          <span class="money-code">
            <span class="alipay-code">
              <div class="code-image"></div>
              <b>使用支付宝打赏</b>
            </span>
            <span class="wechat-code">
              <div class="code-image"></div>
              <b>使用微信打赏</b>
            </span>
          </span>
        </div>
        <p class="notice">点击上方按钮,请我喝杯咖啡！</p>
      </div>
    
    <!-- 打赏 END -->
    <!-- 二维码 START -->
    
      <div class="qrcode">
        <canvas id="share-qrcode"></canvas>
        <p class="notice">扫描二维码，分享此文章</p>
      </div>
    
    <!-- 二维码 END -->
    
      <!-- Gitment START -->
      <div id="comments"></div>
      <!-- Gitment END -->
    
  </article>
  <!-- Article END -->
  <!-- Catalog START -->
  
    <aside class="catalog-container">
  <div class="toc-main">
  <!-- 不蒜子统计 -->
    <strong class="toc-title">目录</strong>
    
      <ol class="toc-nav"><li class="toc-nav-item toc-nav-level-1"><a class="toc-nav-link" href="#13-Elasticsearch-分析过程"><span class="toc-nav-text">13-Elasticsearch - 分析过程</span></a><ol class="toc-nav-child"><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#一-前言"><span class="toc-nav-text">一 前言</span></a></li><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#二分析过程"><span class="toc-nav-text">二分析过程</span></a></li><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#三分析器"><span class="toc-nav-text">三分析器</span></a><ol class="toc-nav-child"><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-1-标准分析器：standard-analyzer"><span class="toc-nav-text">3.1 标准分析器：standard analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-2-简单分析器：simple-analyzer"><span class="toc-nav-text">3.2 简单分析器：simple analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-3-空白分析器：whitespace-analyzer"><span class="toc-nav-text">3.3 空白分析器：whitespace analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-4-停用词分析器：stop-analyzer"><span class="toc-nav-text">3.4 停用词分析器：stop analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-5-关键词分析器：keyword-analyzer"><span class="toc-nav-text">3.5 关键词分析器：keyword analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-6-模式分析器：pattern-analyzer"><span class="toc-nav-text">3.6 模式分析器：pattern analyzer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-7-语言和多语言分析器：chinese"><span class="toc-nav-text">3.7 语言和多语言分析器：chinese</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#3-8-雪球分析器：snowball-analyzer"><span class="toc-nav-text">3.8 雪球分析器：snowball analyzer</span></a></li></ol></li><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#四-字符过滤器"><span class="toc-nav-text">四 字符过滤器</span></a><ol class="toc-nav-child"><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#4-1-HTML字符过滤器"><span class="toc-nav-text">4.1 HTML字符过滤器</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#4-2-映射字符过滤器"><span class="toc-nav-text">4.2 映射字符过滤器</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#4-3-模式替换过滤器"><span class="toc-nav-text">4.3 模式替换过滤器</span></a></li></ol></li><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#五-分词器"><span class="toc-nav-text">五 分词器</span></a><ol class="toc-nav-child"><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-1-标准分词器：standard-tokenizer"><span class="toc-nav-text">5.1 标准分词器：standard tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-2-关键词分词器：keyword-tokenizer"><span class="toc-nav-text">5.2 关键词分词器：keyword tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-3-字母分词器：letter-tokenizer"><span class="toc-nav-text">5.3 字母分词器：letter tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-4-小写分词器：lowercase-tokenizer"><span class="toc-nav-text">5.4 小写分词器：lowercase tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-5-空白分词器：whitespace-tokenizer"><span class="toc-nav-text">5.5 空白分词器：whitespace tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-6-模式分词器：pattern-tokenizer"><span class="toc-nav-text">5.6 模式分词器：pattern tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-7-UAX-URL电子邮件分词器：UAX-RUL-email-tokenizer"><span class="toc-nav-text">5.7 UAX URL电子邮件分词器：UAX RUL email tokenizer</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#5-8-路径层次分词器：path-hierarchy-tokenizer"><span class="toc-nav-text">5.8 路径层次分词器：path hierarchy tokenizer</span></a></li></ol></li><li class="toc-nav-item toc-nav-level-2"><a class="toc-nav-link" href="#六-分词过滤器"><span class="toc-nav-text">六 分词过滤器</span></a><ol class="toc-nav-child"><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#6-1-自定义分词过滤器"><span class="toc-nav-text">6.1 自定义分词过滤器</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#6-2-自定义小写分词过滤器"><span class="toc-nav-text">6.2 自定义小写分词过滤器</span></a></li><li class="toc-nav-item toc-nav-level-3"><a class="toc-nav-link" href="#6-3-多个分词过滤器"><span class="toc-nav-text">6.3 多个分词过滤器</span></a></li></ol></li></ol></li></ol>
    
  </div>
</aside>
  
  <!-- Catalog END -->
</main>

<script>
  (function () {
    var url = 'http://www.liuqingzheng.top/db/Elasticsearch系列/9-11文档操作/13-Elasticsearch - 分析过程/';
    var banner = ''
    if (banner !== '' && banner !== 'undefined' && banner !== 'null') {
      $('#article-banner').css({
        'background-image': 'url(' + banner + ')'
      })
    } else {
      $('#article-banner').geopattern(url)
    }
    $('.header').removeClass('fixed-header')

    // error image
    $(".markdown-content img").on('error', function() {
      $(this).attr('src', 'http://file.muyutech.com/error-img.png')
      $(this).css({
        'cursor': 'default'
      })
    })

    // zoom image
    $(".markdown-content img").on('click', function() {
      var src = $(this).attr('src')
      if (src !== 'http://file.muyutech.com/error-img.png') {
        var imageW = $(this).width()
        var imageH = $(this).height()

        var zoom = ($(window).width() * 0.95 / imageW).toFixed(2)
        zoom = zoom < 1 ? 1 : zoom
        zoom = zoom > 2 ? 2 : zoom
        var transY = (($(window).height() - imageH) / 2).toFixed(2)

        $('body').append('<div class="image-view-wrap"><div class="image-view-inner"><img src="'+ src +'" /></div></div>')
        $('.image-view-wrap').addClass('wrap-active')
        $('.image-view-wrap img').css({
          'width': `${imageW}`,
          'transform': `translate3d(0, ${transY}px, 0) scale3d(${zoom}, ${zoom}, 1)`
        })
        $('html').css('overflow', 'hidden')

        $('.image-view-wrap').on('click', function() {
          $(this).remove()
          $('html').attr('style', '')
        })
      }
    })
  })();
</script>


  <script>
    var qr = new QRious({
      element: document.getElementById('share-qrcode'),
      value: document.location.href
    });
  </script>



  <script>
    var gitmentConfig = "liuqingzheng";
    if (gitmentConfig !== 'undefined') {
      var gitment = new Gitment({
        id: "db/Elasticsearch系列/9-11文档操作/13-Elasticsearch - 分析过程",
        owner: "liuqingzheng",
        repo: "FuckBlog",
        oauth: {
          client_id: "32a4076431cf39d0ecea",
          client_secret: "94484bd79b3346a949acb2fda3c8a76ce16990c6"
        },
        theme: {
          render(state, instance) {
            const container = document.createElement('div')
            container.lang = "en-US"
            container.className = 'gitment-container gitment-root-container'
            container.appendChild(instance.renderHeader(state, instance))
            container.appendChild(instance.renderEditor(state, instance))
            container.appendChild(instance.renderComments(state, instance))
            container.appendChild(instance.renderFooter(state, instance))
            return container;
          }
        }
      })
      gitment.render(document.getElementById('comments'))
    }
  </script>




    <div class="scroll-top">
  <span class="arrow-icon"></span>
</div>
    <footer class="app-footer">
<!-- 不蒜子统计 -->
<span id="busuanzi_container_site_pv">
     本站总访问量<span id="busuanzi_value_site_pv"></span>次
</span>
<span class="post-meta-divider">|</span>
<span id="busuanzi_container_site_uv" style='display:none'>
     本站访客数<span id="busuanzi_value_site_uv"></span>人
</span>
<script async src="//busuanzi.ibruce.info/busuanzi/2.3/busuanzi.pure.mini.js"></script>



  <p class="copyright">
    &copy; 2021 | Proudly powered by <a href="https://www.cnblogs.com/xiaoyuanqujing" target="_blank">小猿取经</a>
    <br>
    Theme by <a href="https://www.cnblogs.com/xiaoyuanqujing" target="_blank" rel="noopener">小猿取经</a>
  </p>
</footer>

<script>
  function async(u, c) {
    var d = document, t = 'script',
      o = d.createElement(t),
      s = d.getElementsByTagName(t)[0];
    o.src = u;
    if (c) { o.addEventListener('load', function (e) { c(null, e); }, false); }
    s.parentNode.insertBefore(o, s);
  }
</script>
<script>
  async("//cdnjs.cloudflare.com/ajax/libs/fastclick/1.0.6/fastclick.min.js", function(){
    FastClick.attach(document.body);
  })
</script>

<script>
  var hasLine = 'true';
  async("//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js", function(){
    $('figure pre').each(function(i, block) {
      var figure = $(this).parents('figure');
      if (hasLine === 'false') {
        figure.find('.gutter').hide();
      }
      var lang = figure.attr('class').split(' ')[1] || 'code';
      var codeHtml = $(this).html();
      var codeTag = document.createElement('code');
      codeTag.className = lang;
      codeTag.innerHTML = codeHtml;
      $(this).attr('class', '').empty().html(codeTag);
      figure.attr('data-lang', lang.toUpperCase());
      hljs.highlightBlock(block);
    });
  })
</script>





<!-- Baidu Tongji -->

<script>
    var _baId = 'c5fd96eee1193585be191f318c3fa725';
    // Originial
    var _hmt = _hmt || [];
    (function() {
      var hm = document.createElement("script");
      hm.src = "//hm.baidu.com/hm.js?" + _baId;
      var s = document.getElementsByTagName("script")[0];
      s.parentNode.insertBefore(hm, s);
    })();
</script>


<script src="/js/script.js"></script>


<script src="/js/search.js"></script>


<script src="/js/load.js"></script>



  <span class="local-search local-search-google local-search-plugin" style="right: 50px;top: 70px;;position:absolute;z-index:2;">
      <input type="search" placeholder="站内搜索" id="local-search-input" class="local-search-input-cls" style="">
      <div id="local-search-result" class="local-search-result-cls"></div>
  </span>


  </body>
</html>